This file produces a set of basic quality checks to highlight potential issues in data collection. The survey used in this report is the 2018 Mozambique SDI survey.
To start, the following figures and tables will highlight missing values for a few of our key indicators. Ignore the fact that school knowledge, operational management, management skills, instructional leadership, and ECD scores are missing for now, as this information was not available fully in SDI.
Teacher absence is the most problematic, followed by infrastructure, inputs, and content knowledge.
Below the missings plot is a table of summary statistics for a few key indicators. This shows the min, 25th percentile, median, 75th percentile, max, mean, standard deviation, total number of schools, and number of schools with missing information for each variable. The underlying data is aggregated to the school, and the means reported are raw means, not weighted means, which will be produced in the report. These are meant to give a basic idea of the data.
| var | min | q25 | median | q75 | max | mean | sd | n | number_missing |
|---|---|---|---|---|---|---|---|---|---|
| 4th Grade Student Assessment | 1 | 13.000 | 25 | 47.5 | 87.0 | 32.043011 | 22.5968327 | 338 | 59 |
| Teacher Absence | 0 | 0.000 | 22 | 50.0 | 100.0 | 32.768340 | 35.2619269 | 338 | 79 |
| Teacher Assessment | 10 | 34.000 | 41 | 49.0 | 71.0 | 41.479554 | 11.4421220 | 338 | 69 |
| TEACH Pedagogy Score | 13 | 25.125 | 29 | 33.5 | 47.5 | 29.553309 | 6.3067589 | 338 | 63 |
| Basic Inputs | 0 | 1.000 | 2 | 2.0 | 3.0 | 1.793680 | 0.8029778 | 338 | 69 |
| Basic Infrastructure | 0 | 1.000 | 1 | 2.0 | 4.0 | 1.553232 | 0.8190358 | 338 | 75 |
In the following map below, users may click on specific provinces or regions to examine missing indicators. The slider controls the schools that appear based on the number of missing indicators. For instance, sliding the slider to 4 will keep only schools that are missing four or more indicators, indicating a relatively severe missing data problem. In the future, I may also include checkboxes for specific survey supervisors, to examine if any particular supervisors have worse performance than others. I could also add filters by day the survey took place.
The map is color coded. Green indicators, for instance, have no missing information on our key indicators: 4th grade student achievement, teacher absence, teacher content knowledge, teacher pedagogy (TEACH), basic inputs, and basic infrastructure ). Black indicators are missing all six indicators. More indicators can be added to this list, but for now in the SDI data this is what we could produce before our data collection.
In the following, we highlight schools that have outliers in terms terms of their practice indicators compared to their 4th grade learning outcomes. A simple model is estimated relating 4th grade student learning to our practice indicators: teacher absence, teacher content knowledge, teacher pedagogy (TEACH), basic inputs, and basic infrastructure. The learning outcomes are compared to predicted values from this model. For instance, if the school scores poorly on teacher absence, content knowledge, pedagogy, inputs and infrastructure and thus has a low predicted value for student achievement, but in fact has very high student achievement, it may signal a problem with the quality of the data for that school. This is meant to be merely a first check of the data, and does not necessarily indicate a problem.
We model the fraction correct on the student achievement exam using the logistic functional form:
\[E(A_i|X_i)=\frac{e^{(\beta_0 + \beta_1X_i)}}{1+e^{(\beta_0 + \beta_1 X_i)}}\]
Where \(A_i\) is student achievement in fourth grade for school i. \(X_i\) is a vector of our practice indicators: teacher absence, teacher content knowledge, teacher pedagogy (TEACH), basic inputs, and basic infrastructure. The logistic functional form for the fraction correct was chosen, because it can be justified using a very simple Rasch IRT model, in which the probability of answering correctly to each item also follows the logistic functional form.
The map below is color coded to show where the gap between the actual student achievement and the predicted student achievement are largest. The slider allows users to filter based on the squared error from the model, the squared difference between actual achievement and predicted achievement, to look for school locations that may be concerning. Again, this does not necessarily mean there is a problem, but should trigger more investigation.
stargazer(logitmod, title="Logit Model of Outliers - Coefficients", type="html")
| Dependent variable: | |
| student_knowledge_01 | |
| absence_rate | 0.001 |
| (0.005) | |
| content_knowledge | 0.004 |
| (0.014) | |
| pedagogical_knowledge | 0.027 |
| (0.026) | |
| inputs | 0.177 |
| (0.227) | |
| infrastructure | 0.242 |
| (0.206) | |
| Constant | -2.456** |
| (0.961) | |
| Observations | 176 |
| Log Likelihood | -98.226 |
| Akaike Inf. Crit. | 208.451 |
| Note: | p<0.1; p<0.05; p<0.01 |
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode